runTask(context: TaskContext): MapStatus
ShuffleMapTask
ShuffleMapTask
is a Task that writes records of a RDD partition to the shuffle system.
Caution
|
FIXME What RDD is ShuffleMapTask created for?
|
Note
|
Spark uses broadcast variables to send (serialized) tasks to executors. |
Creating ShuffleMapTask
Instance
Caution
|
FIXME |
Writing Records From RDD Partition to Shuffle System — runTask
Method
Note
|
runTask is a part of Task contract to…FIXME
|
runTask
returns a MapStatus (with the location and estimated size of the result RDD block) after the records of the Partition were written to the shuffle system.
Internally, runTask
uses the current closure Serializer
to deserialize the taskBinary
serialized task (into a pair of RDD and ShuffleDependency).
runTask
measures the thread and CPU time for deserialization (using the System clock and JMX if supported) and stores it in _executorDeserializeTime
and _executorDeserializeCpuTime
attributes.
Note
|
runTask uses SparkEnv to access the current closure Serializer .
|
Note
|
The taskBinary serialized task is given when ShuffleMapTask is created.
|
runTask
requests ShuffleManager
for a ShuffleWriter
(given the ShuffleHandle
of the deserialized ShuffleDependency
, RDD partition and input TaskContext).
Note
|
runTask uses SparkEnv to access the current ShuffleManager .
|
Note
|
The partition is given when ShuffleMapTask is created.
|
runTask
computes the records in the RDD partition and writes them (to the shuffle system).
Note
|
This is the moment in Task 's lifecycle (and its RDD) when a RDD partition is computed and hence becomes a sequence of records (i.e. real data).
|
runTask
stops the ShuffleWriter
(with success
flag enabled) and returns the MapStatus
.
When the record writing was not successful, runTask
stops the ShuffleWriter
(with success
flag disabled) and the exception is re-thrown.
You may also see the following DEBUG message in the logs when the ShuffleWriter
could not be stopped.
DEBUG Could not stop writer